INAOE's Participation at PAN'13: Author Profiling Task Notebook for PAN at CLEF 2013

نویسندگان

  • Adrián Pastor López-Monroy
  • Manuel Montes-y-Gómez
  • Hugo Jair Escalante
  • Luis Villaseñor Pineda
  • Esaú Villatoro-Tello
چکیده

This paper describes the participation of the Laboratory of Language Technologies of INAOE at PAN 2013 evaluation lab. We adopted second order representations for facing the problem of Author Profiling (AP). This representation tackles two shortcomings of the typical Bag-of-Terms: i) the sparsity and high dimensionality of document representations, and ii) the assumption of total independence between terms in documents. In order to overcome these problems the proposed representation builds document vectors in a space of the different profiles, which represent the relationships of each document with the different profiles (say, age and gender). In order to evaluate our approach, we compare the proposed representation against a standard Bag-of-Terms representation using the PAN 2013 corpus for AP. We found that the second order attributes using a low computational cost, show evidence of being useful to determine genre and age profile.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Style-based Distance Features for Author Verification Notebook for PAN at CLEF 2013

In this paper we present the approach we took in our participation to the PAN 2013 Author Profiling task. It is an adaptation of our system submitted for author identification, assuming that a profile category (authors belonging to the same gender and age group categories) can be analyzed in the same way as an author’s style.

متن کامل

Readability for Author Profiling? Notebook for PAN at CLEF 2013

This paper briefly describes the approach taken to the Author Profiling task at PAN 13. It describes the simple features used, and the origins in thinking around text readability as a mechanism for identification, and the predictive model used which may have beneficially omitted classes, as well as offering commentary on the results obtained.

متن کامل

Style-based Distance Features for Author Profiling Notebook for PAN at CLEF 2013

In this paper we present the approach we took in our participation to the PAN 2013 Author Identification task. It relies on a complex process to select the features which represent the author’s writing, using potentially multiple statistics and distance measures computed from the training set.

متن کامل

Ensemble-based Classification for Author Profiling Using Various Features Notebook for PAN at CLEF 2013

This paper summarize our approach to author profiling task – a part of evaluation lab PAN’13. We have used ensemble-based classification on large features set. All the features are roughly described and experimental section provides evaluation of different methods and classification approaches.

متن کامل

Using Simple Content Features for the Author Profiling Task Notebook for PAN at CLEF 2013

This paper describes the methods we have employed to solve the author profiling task at PAN-2013. Our goal was to use simple features to identify the age group and the gender of the author of a given text. We introduce the features, detail how the classifiers were trained, and how the experiments were run.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013